Project 1 - DualLens Analytics¶
Background Story¶
In the rapidly evolving world of finance and technology, investors are constantly seeking ways to make smarter decisions by combining traditional financial analysis with emerging technological insights. While stock market trends provide a numerical perspective on growth, an organization’s initiatives in cutting-edge fields like Artificial Intelligence (AI) reveal its future readiness and innovation potential. However, analyzing both dimensions - quantitative financial performance and qualitative AI initiatives - requires sifting through multiple, diverse data sources: stock data from platforms like Yahoo Finance, reports in PDFs, and contextual reasoning using Large Language Models (LLMs).
This is where DualLens Analytics comes in. By applying a dual-lens approach, the project leverages Retrieval-Augmented Generation (RAG) to merge financial growth data with strategic insights from organizational reports. Stock data provides evidence of stability and momentum, while AI initiative documents reveal forward-looking innovation. Together, they form a richer, more holistic picture of organizational potential.
With DualLens Analytics, investors no longer need to choose between numbers and narratives—they gain a unified, AI-driven perspective that ranks organizations by both financial strength and innovation readiness, enabling smarter, future-focused investment strategies.
Problem Statement¶
Traditional investment analysis often focuses on financial metrics alone (e.g., stock growth, revenue, market cap), missing the qualitative dimension of how prepared a company is for the future. On the other hand, qualitative documents like strategy PDFs contain valuable insights about innovation and AI initiatives, but they are difficult to structure, query, and integrate with numeric financial data.
This leads to three core challenges:
Fragmented Data Sources: Financial data (stock prices) and strategic insights (PDFs) exist in silos.
Limited Analytical Scope: Manual analysis of growth trends and PDF reports is time-consuming and error-prone.
Decisional Blind Spots: Without integrating both quantitative (growth trends) and qualitative (AI initiatives) signals, investors may miss out on high-potential organizations.
Solution Approach¶
To address this challenge, we set out to build a Retrieval-Augmented Generation (RAG) powered system that blends financial trends with AI-related strategic insights, helping investors rank organizations based on growth trajectory and innovation capacity.
NOTE
You need to look for "--- --- ---" and add your code over there, this is a placeholder.
Setting up Installations and Imports¶
# @title Run this cell => Restart the session => Start executing the below cells **(DO NOT EXECUTE THIS CELL AGAIN)**
!uv pip install langchain==0.3.25 \
langchain-core==0.3.65 \
langchain-openai==0.3.24 \
chromadb \
langchain-community==0.3.20 \
pypdf==5.4.0
Using Python 3.13.4 environment at: C:\Users\Sam\Downloads\.venv Resolved 115 packages in 1.05s Prepared 10 packages in 779ms Uninstalled 2 packages in 43ms Installed 103 packages in 5.34s + aiohappyeyeballs==2.6.1 + aiohttp==3.13.1 + aiosignal==1.4.0 + annotated-types==0.7.0 + anyio==4.11.0 + attrs==25.4.0 + backoff==2.2.1 + bcrypt==5.0.0 + build==1.3.0 + cachetools==6.2.1 + chromadb==1.2.0 + click==8.3.0 + coloredlogs==15.0.1 + dataclasses-json==0.6.7 + distro==1.9.0 + durationpy==0.10 + filelock==3.20.0 + flatbuffers==25.9.23 + frozenlist==1.8.0 + fsspec==2025.9.0 + google-auth==2.41.1 + googleapis-common-protos==1.70.0 + greenlet==3.2.4 + grpcio==1.75.1 + h11==0.16.0 + httpcore==1.0.9 + httptools==0.7.1 + httpx==0.28.1 + httpx-sse==0.4.3 + huggingface-hub==0.35.3 + humanfriendly==10.0 + importlib-metadata==8.7.0 + importlib-resources==6.5.2 + jiter==0.11.1 + jsonpatch==1.33 + jsonpointer==3.0.0 + jsonschema==4.25.1 + jsonschema-specifications==2025.9.1 + kubernetes==34.1.0 + langchain==0.3.25 + langchain-community==0.3.20 + langchain-core==0.3.65 + langchain-openai==0.3.24 + langchain-text-splitters==0.3.8 + langsmith==0.3.45 + markdown-it-py==4.0.0 + marshmallow==3.26.1 + mdurl==0.1.2 + mmh3==5.2.0 + mpmath==1.3.0 + multidict==6.7.0 + mypy-extensions==1.1.0 + oauthlib==3.3.1 + onnxruntime==1.23.1 + openai==1.109.1 + opentelemetry-api==1.38.0 + opentelemetry-exporter-otlp-proto-common==1.38.0 + opentelemetry-exporter-otlp-proto-grpc==1.38.0 + opentelemetry-proto==1.38.0 + opentelemetry-sdk==1.38.0 + opentelemetry-semantic-conventions==0.59b0 + orjson==3.11.3 + overrides==7.7.0 - packaging==25.0 + packaging==24.2 + posthog==5.4.0 + propcache==0.4.1 + pyasn1==0.6.1 + pyasn1-modules==0.4.2 + pybase64==1.4.2 + pydantic==2.12.3 + pydantic-core==2.41.4 + pydantic-settings==2.11.0 + pypdf==5.4.0 + pypika==0.48.9 + pyproject-hooks==1.2.0 + pyreadline3==3.5.4 + python-dotenv==1.1.1 + pyyaml==6.0.3 + referencing==0.37.0 + regex==2025.9.18 + requests-oauthlib==2.0.0 + requests-toolbelt==1.0.0 + rich==14.2.0 + rpds-py==0.27.1 + rsa==4.9.1 + shellingham==1.5.4 + sniffio==1.3.1 + sqlalchemy==2.0.44 + sympy==1.14.0 + tenacity==9.1.2 + tiktoken==0.12.0 + tokenizers==0.22.1 + tqdm==4.67.1 + typer==0.19.2 + typing-inspect==0.9.0 + typing-inspection==0.4.2 - urllib3==2.5.0 + urllib3==2.3.0 + uvicorn==0.37.0 + watchfiles==1.1.1 + websocket-client==1.9.0 + yarl==1.22.0 + zipp==3.23.0 + zstandard==0.23.0
import yfinance as yf # Used for gathering stock prices
import matplotlib.pyplot as plt # Used for Data Visualization / Plots / Graphs
import pandas as pd # Helpful for working with tabular data like DataFrames
import os # Interacting with the operating system
from langchain.text_splitter import RecursiveCharacterTextSplitter # Helpful in splitting the PDF into smaller chunks
from langchain_community.document_loaders import PyPDFDirectoryLoader, PyPDFLoader # Loading a PDF
from langchain_community.vectorstores import Chroma # Vector DataBase
C:\Users\Sam\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\utils\hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn(
1. Organization Selection¶
Selecting the below five organizations as the analysis pool.
companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"]
2. Setting up LLM - 1 Marks¶
- The
config.jsonfile should contain API_KEY and API BASE URL provided by OpenAI. - You need to insert your actual API keys and endpoint URL obtained from your Olympus account. Refer to the OpenAI Access Token documentation for more information on how to generate and manage your API keys.
- This code reads the
config.jsonfile and extracts the API details.- The
API_KEYis a unique secret key that authorizes your requests to OpenAI's API. - The
OPENAI_API_BASEis the API BASE URL where the model will process your requests.
- The
What To Do?
Use the sample
config.jsonfile provided.Add their OpenAI API Key and Base URL to the file.
The
config.jsonshould look like this:{ "API_KEY": "your_openai_api_key_here", "OPENAI_API_BASE": "https://your_openai_api_base/v1" }
#Loading the `config.json` file
import json
import os
# Load the JSON file and extract values
file_name = "config.json"
with open(file_name, 'r') as file:
config = json.load(file)
os.environ['OPENAI_API_KEY'] = config['OPENAI_API_KEY'] # Loading the API Key
os.environ["OPENAI_BASE_URL"] = config['OPENAI_API_BASE'] # Loading the API Base Url
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o-mini", # "gpt-4o-mini" to be used as an LLM
temperature="0", # Set the temprature to 0
max_tokens="5000", # Set the max_tokens = 5000, so that the long response will not be clipped off
top_p=0.95,
frequency_penalty=1.2,
stop_sequences=['INST']
)
3. Visualization and Insight Extraction - 5 Marks¶
Gather stock data for the selected organization from the past three years using the YFinance library, and visualize this data for enhanced analysis.
Your Task
- Loop through each company to retrieve stock data of the last three years using the YFinance library.
- Plot the closing prices for each company.
plt.figure(figsize=(14,7))
# Loop through each company and plot closing prices
for symbol in companies:
ticker = yf.Ticker(symbol)
data = ticker.history(period="3y")
# Plot closing price
plt.plot(data.index, data['Close'], label=symbol)
plt.title("Stock Price Trends (Last 3 Years)")
plt.xlabel("Date")
plt.ylabel("Price (USD)")
plt.legend()
plt.grid(True)
plt.savefig("Stock_Price_Trends_3Y.png")
plt.show()
Financial Metrics¶
- Market Cap: Total market value of a company’s outstanding shares.
- P/E Ratio: Shows how much investors are willing to pay per dollar of earnings.
- Dividend Yield: Annual dividend income as a percentage of the stock price.
- Beta: Measures a stock’s volatility relative to the overall market.
- Total Revenue: The total income a company generates from its business operations.
Your Task
- Loop through all the companies to fetch data based on the specified financial metrics.
- Create a DataFrame (DF) from the collected data.
- Visualize and compare each financial metric across all companies.
- For example, visualize and compare the market capitalization for each company.
Tip: Check ticker.info for the available financial metrics
import pandas as pd
import matplotlib.pyplot as plt
companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"]
metrics_list = {}
# Fetching the financial metrics
for symbol in companies: # Loop through all the companies
ticker = yf.Ticker(symbol)
info = ticker.info
metrics_list[symbol] = { # Define the dictionary of all the Finanical Metrics
"Market Cap": info.get("marketCap", 0),
"P/E Ratio": info.get("forwardPE", 0),
"Dividend Yield": info.get("dividendYield", 0),
"Beta": info.get("beta", 0),
"Total Revenue": info.get("totalRevenue", 0)
}
# Convert to DataFrame
df = pd.DataFrame(metrics_list)
# Converting large numbers to billions for readability by divinding the whole column by 1e9
df.loc["Market Cap",companies] = df.loc["Market Cap",companies].apply(lambda x: x / 1e9)
df.loc["Total Revenue",companies] = df.loc["Total Revenue",companies].apply(lambda x: x/1e9)
df.loc["Dividend Yield",companies] = df.loc["Dividend Yield", companies].apply(lambda x: x*100) # Convert to percentage
print(df)
df # Printing the df
GOOGL MSFT IBM NVDA AMZN Market Cap 3043.437838 3817.525740 262.017729 4460.857262 2287.303655 P/E Ratio 28.270090 34.353180 26.510840 44.470875 34.640648 Dividend Yield 33.000000 71.000000 239.000000 2.000000 0.000000 Beta 1.000000 1.023000 0.724000 2.123000 1.281000 Total Revenue 371.399000 281.723994 64.040002 165.217993 670.038032
| GOOGL | MSFT | IBM | NVDA | AMZN | |
|---|---|---|---|---|---|
| Market Cap | 3043.437838 | 3817.525740 | 262.017729 | 4460.857262 | 2287.303655 |
| P/E Ratio | 28.270090 | 34.353180 | 26.510840 | 44.470875 | 34.640648 |
| Dividend Yield | 33.000000 | 71.000000 | 239.000000 | 2.000000 | 0.000000 |
| Beta | 1.000000 | 1.023000 | 0.724000 | 2.123000 | 1.281000 |
| Total Revenue | 371.399000 | 281.723994 | 64.040002 | 165.217993 | 670.038032 |
# Plot each metric as a separate bar graph
metrics_to_plot = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"]
for metric in metrics_to_plot:
plt.figure(figsize=(10,5))
plt.bar(df.index, df[metric], color='skyblue')
plt.title(f"{metric} Comparison")
plt.ylabel(metric)
plt.xlabel("Company")
plt.grid(axis='y')
plt.show()
4. RAG-Driven Analysis - 7 Marks¶
Performing the RAG-Driven Analysis on the AI Initiatives of the companies
Your Task
- Extract all PDF files from the provided ZIP file.
- Read the content from each PDF file.
- Split the content into manageable chunks.
- Store the chunks in a vector database using embedding functions.
- Implement a query mechanism on the vector database to retrieve results based on user queries regarding AI initiatives.
- Evaluate the LLM generated response using LLM-as-Judge
A. Loading Company AI Initiative Documents (PDFs) - 1 mark¶
# Unzipping the AI Initiatives Documents
import zipfile
with zipfile.ZipFile("Companies-AI-Initiatives.zip", 'r') as zip_ref:
zip_ref.extractall("/content/") # Storing all the unzipped contents in this location
# Path of all AI Initiative Documents
ai_initiative_pdf_paths = [f"/content/Companies-AI-Initiatives/{file}" for file in os.listdir("/content/Companies-AI-Initiatives")]
ai_initiative_pdf_paths
['/content/Companies-AI-Initiatives/AMZN.pdf', '/content/Companies-AI-Initiatives/GOOGL.pdf', '/content/Companies-AI-Initiatives/IBM.pdf', '/content/Companies-AI-Initiatives/MSFT.pdf', '/content/Companies-AI-Initiatives/NVDA.pdf']
from langchain_community.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader(path = "/content/Companies-AI-Initiatives/") # Creating an PDF loader object
# Defining the text splitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
encoding_name='cl100k_base',
chunk_size=900,
chunk_overlap=175
)
!pip install pypdf
# Splitting the chunks using the text splitter
ai_initiative_chunks = loader.load_and_split(text_splitter)
Requirement already satisfied: pypdf in c:\users\sam\appdata\local\programs\python\python313\lib\site-packages (6.1.1)
# Total length of all the chunks
len(ai_initiative_chunks)
69
B. Vectorizing AI Initiative Documents with ChromaDB - 1 mark¶
# Defining the 'text-embedding-ada-002' as the embedding model
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model='text-embedding-ada-002')
# Creating a Vectorstore, storing all the above created chunks using an embedding model
vectorstore = Chroma.from_documents(
ai_initiative_chunks,
embedding_model,
collection_name="AI_Initiatives"
)
# Ignore if it gives an error or warning
# Creating an retriever object which can fetch ten similar results from the vectorstore
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k":10}
)
C. Retrieving relevant Documents - 3 marks¶
user_message = "Give me the best project that `IBM` company is working upon"
# Building the context for the query using the retrieved chunks
relevant_document_chunks = retriever.get_relevant_documents(user_message)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
C:\Users\Sam\AppData\Local\Temp\ipykernel_15732\1632725213.py:2: LangChainDeprecationWarning: The method `BaseRetriever.get_relevant_documents` was deprecated in langchain-core 0.1.46 and will be removed in 1.0. Use :meth:`~invoke` instead. relevant_document_chunks = retriever.get_relevant_documents(user_message)
len(relevant_document_chunks)
10
# Write a system message for an LLM to help craft a response from the provided context
qna_system_message = """
You are a knowledgeable research assistant that answers questions using only the provided context documents.
Each document chunk includes metadata such as:
- title (e.g., "AI Initiatives")
- source (e.g., "\\content\\Companies-AI-Initiatives\\IBM.pdf")
- page_label (page number or label)
- page_content (the actual text)
Your goal is to synthesize accurate, complete, and well-structured answers based *solely* on this retrieved context.
-------------------------
### INSTRUCTIONS
1. **Ground every answer in the given context.**
- Use only facts explicitly stated or clearly implied by the provided document chunks.
- Do not include outside knowledge or speculation.
2. **Be concise but comprehensive.**
- Summarize and integrate information from multiple chunks if they appear to refer to the same topic, initiative, or section of a document.
3. **Maintain factual accuracy and attribution.**
- When referencing specific data, initiatives, or statements, cite the source in this format:
`[source: {metadata.source}, page {metadata.page_label}]`
- Combine pages into a single citation when summarizing across multiple chunks of the same document.
4. **If information is missing or incomplete,** state that explicitly.
5. **Structure your answer clearly:**
- **Answer:** 1–3 paragraphs of well-written synthesis.
- **Supporting Details:** bullet points or subheadings summarizing key items (if relevant).
- **Citations:** placed inline or at the end.
-------------------------
### EXAMPLES
**User Question:** What is IBM Granite and when was it launched?
**Answer:**
IBM Granite is a family of open-source, high-performance AI foundation models developed by IBM to enable scalable and customizable enterprise AI solutions. Introduced in **September 2023**, Granite models range from 2 to 34 billion parameters and are optimized for efficient enterprise use across domains like document processing, code generation, and data analysis.
**Citations:** [source: IBM.pdf, pages 1–2]
-------------------------
Now, use the retrieved context below to answer the user’s question.
If the answer cannot be fully determined from the context, clearly say so.
-------------------------
CONTEXT:
{context}
-------------------------
USER QUESTION:
{question}
-------------------------
YOUR ANSWER:
"""
# Write an user message template which can be used to attach the context and the questions
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}
###Question
{question}
"""
# Format the prompt
formatted_prompt = f"""[INST]{qna_system_message}\n
{'user'}: {qna_user_message_template.format(context=context_for_query, question=user_message)}
[/INST]"""
# Make the LLM call
resp = llm.invoke(formatted_prompt)
resp.content
"**Answer:** Based on the provided context, one of the most significant projects IBM is currently working on is **IBM Granite**, a family of open-source, high-performance AI foundation models. Launched in **September 2023**, Granite aims to empower enterprise applications across various industries by providing scalable and customizable AI solutions. The models are designed for efficient integration into business workflows while ensuring control over data and model customization.\n\nGranite models range from 2 billion to 34 billion parameters and are optimized for tasks such as document processing, code generation, customer support, and data analysis. They have been trained on diverse datasets including internet content and domain-specific documents like legal texts. This initiative aligns with IBM's strategic focus on advancing AI accessibility while promoting responsible use of technology.\n\n**Supporting Details:**\n- **Launch Date:** September 2023\n- **Model Sizes:** Ranging from 2 billion to 34 billion parameters\n- **Key Features:**\n - Open-source under Apache 2.0 license\n - Designed for lower computational requirements compared to larger proprietary models\n - Seamless integration with IBM’s Watsonx platform\n\n**Citations:** [source: \\content\\Companies-AI-Initiatives\\IBM.pdf, pages 1–4]"
# Define RAG function
def RAG(user_message):
"""
Args:
user_message: Takes a user input for which the response should be retrieved from the vectorDB.
Returns:
relevant context as per user query.
"""
relevant_document_chunks = retriever.get_relevant_documents(user_message)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
# Combine qna_system_message and qna_user_message_template to create the prompt
prompt = f"""[INST]{qna_system_message}\n
{'user'}: {qna_user_message_template.format(context=context_for_query, question=user_message)}
[/INST]"""
# Quering the LLM
try:
response = llm.invoke(prompt)
except Exception as e:
response = f'Sorry, I encountered the following error: \n {e}'
return response.content
# Test Cases
print(RAG("How is the area in which GOOGL is working different from the area in which MSFT is working?"))
**Answer:** Google (GOOGL) and Microsoft (MSFT) are both heavily invested in artificial intelligence, but they focus on different aspects and applications of AI technologies. Google emphasizes its leadership in AI research and development through initiatives like Google Brain, DeepMind, and the Gemini models. These efforts are geared towards enhancing user experiences across a wide range of consumer products such as Google Search, Gmail, and Google Assistant while also providing enterprise solutions via Google Cloud’s Vertex AI. Google's approach integrates advanced capabilities in natural language processing, computer vision, speech recognition, and generative AI to drive innovation across industries. In contrast, Microsoft's strategy is centered around embedding AI into its existing software ecosystem to enhance productivity tools within Microsoft 365 applications like Word and Excel through features such as Copilot. Additionally, Microsoft has developed Azure AI Foundry Labs as an experimental platform that bridges advanced research with real-world application development for enterprises. This initiative supports rapid prototyping of various AI models while focusing on responsible deployment aligned with ethical standards. **Supporting Details:** - **Google's Focus:** - Initiatives include Gemini for multimodal foundation models. - Emphasis on enhancing consumer products (e.g., Search, Assistant). - Strong investment in research-driven advancements. - **Microsoft's Focus:** - Integration of AI into productivity tools via Copilot. - Development of Azure AI Foundry Labs for experimental applications. - Aims to streamline workflows within business environments. **Citations:** [source: context]
print(RAG("What are the three projects on which MSFT is working upon?"))
**Answer:** Microsoft is currently focusing on three significant AI initiatives: Azure AI Foundry Labs, Microsoft 365 Copilot, and GitHub Copilot. Each of these projects aims to enhance productivity and innovation through the integration of advanced AI technologies.
1. **Azure AI Foundry Labs**:
- An experimental platform designed to accelerate the application of advanced AI research into real-world solutions.
- It serves as a collaborative hub for developers, startups, enterprises, and Microsoft Research teams.
- The initiative supports experimentation with various AI modalities including natural language processing and generative AI while ensuring scalability and security for enterprise applications.
2. **Microsoft 365 Copilot**:
- An embedded productivity assistant across Microsoft 365 applications like Word, Excel, PowerPoint, Outlook, and Teams.
- Utilizes large language models to provide contextual assistance in drafting content and automating tasks.
- Aims to improve efficiency in workflows by integrating generative AI capabilities directly into business processes.
3. **GitHub Copilot**:
- Provides coding support through an advanced AI-driven tool that assists developers within their integrated development environments (IDEs).
- Enhances developer productivity by offering intelligent code suggestions based on context.
These initiatives reflect Microsoft's commitment to embedding artificial intelligence across its ecosystem while addressing challenges related to security, user adoption, model reliability, and compliance with regulations.
**Citations:** [source: {metadata.source}, pages 1–2]
print(RAG("What is the timeline of each project in NVDA?"))
**Answer:** The timeline for NVIDIA's projects, specifically G-Assist and DLSS 4, spans several phases from concept to public availability. 1. **G-Assist Timeline:** - **Concept & Demo Phase:** Early prototypes were showcased in NVIDIA events related to RTX AI initiatives. - **Public Availability:** G-Assist became accessible via the NVIDIA App during the years 2024–2025, marking its first interaction with consumers at scale. - **Iterative Updates (2024–2025):** Ongoing improvements focused on memory efficiency, GPU compatibility expansion, and the introduction of plugin SDKs. 2. **DLSS 4 Timeline:** - **Development Period (2024–2025):** The current generation of DLSS was refined during this time frame with enhancements in frame generation and integration with Reflex technology. - As of 2025, DLSS 4 is fully available and integrated into many new AAA titles. These timelines reflect NVIDIA's commitment to enhancing user experience through iterative development and strategic updates across their AI initiatives. **Citations:** [source: context]
print(RAG("What are the areas in which AMZN is investing when it comes to AI?"))
**Answer:** Amazon is investing in several key areas of artificial intelligence (AI) to enhance its services and maintain a competitive edge in the market. These investments focus on improving customer experience, streamlining operations, and developing advanced AI technologies. 1. **Generative AI Platforms:** - **Amazon Bedrock:** A fully managed serverless platform launched in July 2023 that facilitates the building, scaling, and deployment of generative AI applications. It provides access to various foundation models from leading AI companies, aiming to democratize access to cutting-edge generative AI technologies [source: Amazon Bedrock Overview, pages 1–2]. - **Amazon Olympus:** An initiative focused on creating a multimodal AI model capable of processing text, images, and videos simultaneously. This project aims to reduce reliance on external providers by developing an in-house solution for multimedia content understanding [source: Amazon Olympus Overview]. 2. **Machine Learning Services:** - **Amazon SageMaker:** A comprehensive service that simplifies the process of building, training, and deploying machine learning models at scale. It includes tools for data preparation and labeling as well as model monitoring within a unified environment [source: Amazon SageMaker Initiative Summary]. 3. **Voice Assistants & Robotics:** - Investments are also directed towards innovations like Alexa (the voice assistant) which enhances user interaction through speech recognition technology and robotics used in warehouses for efficient order fulfillment [source: General Context about Amazon's Use of AI]. 4. **AI-Driven Innovations Across Business Functions:** - In retail operations such as product recommendations, dynamic pricing strategies, fraud detection mechanisms, supply chain optimization efforts—all aimed at making shopping experiences faster and more personalized for customers [source: General Context about Amazon's Use of AI]. **Supporting Details:** - The development timelines indicate ongoing enhancements across these initiatives. - Significant resources are allocated toward infrastructure improvements related to these projects. - Challenges include competition with other tech giants like Microsoft and Google. **Citations:** [source: General Context about Amazon's Use of AI], [source: Amazon Bedrock Overview], [source: Amazon Olympus Overview], [source: Amazon SageMaker Initiative Summary].
print(RAG("What are the risks associated with projects within GOOG?"))
**Answer:** Projects within Google, such as Project Astra and the Gemini initiative, face several risks and challenges that could impact their development and deployment. These include privacy concerns related to data processing, technical hurdles in achieving real-time multimodal understanding, user acceptance issues for new AI interactions, and regulatory compliance with evolving AI regulations. **Supporting Details:** - **Privacy Concerns:** Processing live video and audio data raises significant privacy issues that necessitate robust data protection measures. - **Technical Hurdles:** Achieving accurate multimodal understanding in real-time involves overcoming complex AI and hardware challenges. - **User Acceptance:** There is a need to gain user trust for new forms of AI assistants that may interact in more personal or intrusive ways. - **Regulatory Compliance:** Navigating the landscape of global AI regulations poses a challenge for ensuring compliance [source: context]. Additionally: - For Gemini models specifically: - Risks include model safety concerning hallucinations or inaccuracies requiring constant evaluation. - Governance around intellectual property (IP) rights must be navigated carefully due to generative media implications [source: context]. These factors highlight the multifaceted nature of risks associated with Google's ambitious projects in artificial intelligence.
D. Evaluation of the RAG - 2 marks¶
# Writing a question for performing evaluations on the RAG
evaluation_test_question = "What are the three projects on which MSFT is working upon?"
# Building the context for the evaluation test question using the retrieved chunks
relevant_document_chunks = retriever.get_relevant_documents(evaluation_test_question)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
# Default RAG Answer
answer = RAG(evaluation_test_question)
print(answer)
**Answer:** Microsoft is currently focusing on three major AI initiatives: Azure AI Foundry Labs, Microsoft 365 Copilot, and GitHub Copilot. Each of these projects aims to enhance productivity and streamline workflows through advanced AI capabilities.
1. **Azure AI Foundry Labs**:
- An experimental platform designed to accelerate the application of advanced AI research into real-world solutions.
- It serves as a collaborative hub for developers, startups, enterprises, and Microsoft Research teams.
- The initiative supports experimentation with various AI technologies including natural language processing and generative AI.
2. **Microsoft 365 Copilot**:
- An integrated productivity assistant embedded within Microsoft 365 applications like Word, Excel, PowerPoint, Outlook, and Teams.
- Utilizes large language models to provide contextual assistance in drafting content and automating tasks.
- Aims to improve efficiency by reducing time spent on repetitive tasks while enhancing collaboration among users.
3. **GitHub Copilot**:
- Provides coding support through an AI-driven tool that assists developers in writing code more efficiently.
- Enhances developer productivity by offering intelligent suggestions based on context within the Integrated Development Environment (IDE).
These initiatives reflect Microsoft's commitment to embedding artificial intelligence across its ecosystem while addressing challenges related to security, user adoption, model reliability, and compliance with regulations.
**Citations:** [source: {metadata.source}, pages 1–2]
# Defining user messsage template for evaluation
evaluation_user_message_template = """
###Question
{question}
###Context
{context}
###Answer
{answer}
"""
1. Groundedness¶
# Writing the system message and the evaluation metrics for checking the groundedness
groundedness_rater_system_message = """
You are an expert evaluator whose sole task is to assess the **groundedness** of a model's answer
with respect to a set of retrieved context documents.
-------------------------
### YOUR ROLE
You must judge whether the model’s answer is:
1. **Fully grounded** — the answer is entirely supported by the retrieved context.
2. **Partially grounded** — the answer is somewhat supported but includes claims not directly found in the context.
3. **Ungrounded** — the answer contains mostly speculative, irrelevant, or fabricated information not supported by the context.
-------------------------
### EVALUATION GUIDELINES
Carefully compare the *answer* to the *context* and follow these criteria:
- ✅ **Fully Grounded**
- Every major factual statement is explicitly supported or clearly implied by the provided context.
- Citations are accurate and correspond to the correct sources/pages.
- The answer stays within the scope of the context and does not add new facts.
- ⚠️ **Partially Grounded**
- Some statements are grounded, but others are missing, vague, or inferred without clear textual support.
- The answer includes minor hallucinations or unsupported elaborations.
- The model uses plausible but unverified details not in the retrieved chunks.
- ❌ **Ungrounded**
- Most of the answer is speculative, contradictory to the context, or fabricated.
- The context does not contain evidence for key claims.
- Citations (if present) are irrelevant or incorrect.
-------------------------
### OUTPUT FORMAT
Respond **only** with a JSON object containing the following fields:
{
"score": <integer>, # 0 = ungrounded, 1 = partially grounded, 2 = fully grounded
"reasoning": "<your short justification (2-4 sentences)>"
}
-------------------------
### INPUT FIELDS
You will receive the following input fields:
- **context**: the retrieved document chunks or text used to answer the question.
- **answer**: the model’s full output to be evaluated.
- **question**: the original user question (for reference).
Base your judgment strictly on whether the **answer** is supported by the **context**, not on style or completeness.
-------------------------
### EXAMPLES
**Example 1**
question: "When was IBM Granite launched?"
context: "IBM Granite was introduced in September 2023."
answer: "IBM Granite was launched in September 2023 as part of IBM’s enterprise AI model family."
→ { "score": 2, "reasoning": "All factual details match the context exactly." }
**Example 2**
question: "What are IBM’s AI initiatives?"
context: "IBM Granite, Watson, and Guardium AI Security are key initiatives..."
answer: "IBM’s AI projects include Granite and a robotics initiative."
→ { "score": 1, "reasoning": "Partially grounded; Granite is correct but 'robotics initiative' is not in the context." }
**Example 3**
question: "What is IBM Guardium AI Security?"
context: "IBM Guardium AI Security is a tool for managing risks in AI systems..."
answer: "IBM Guardium is a cybersecurity framework launched in 2010 for mainframes."
→ { "score": 0, "reasoning": "Ungrounded; the answer contradicts the context." }
-------------------------
Evaluate the groundedness carefully and return your JSON score and reasoning.
"""
# Combining groundedness_rater_system_message + llm_prompt + answer for evaluation
groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
{'user'}: {evaluation_user_message_template.format(context=context_for_query, question=evaluation_test_question, answer=answer)}
[/INST]"""
# Defining a new LLM object
groundness_checker = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
max_tokens=500,
top_p=0.95,
frequency_penalty=1.2,
stop_sequences=['INST']
)
# Using the LLM-as-Judge for evaluating Groundedness
groundness_response = groundness_checker.invoke(groundedness_prompt)
print(groundness_response.content)
{
"score": 2,
"reasoning": "The answer is fully grounded as it accurately identifies the three major AI initiatives Microsoft is working on: Azure AI Foundry Labs, Microsoft 365 Copilot, and GitHub Copilot. Each initiative's description aligns with the context provided, detailing their purposes and functionalities without introducing unsupported claims."
}
2. Relevance¶
# Writing the system message and the evaluation metrics for checking the relevance
relevance_rater_system_message = relevance_rater_system_message = """
You are an expert evaluator whose task is to assess the **relevance** of a model’s answer
with respect to the original user question.
-------------------------
### YOUR ROLE
You must judge whether the model’s answer directly and completely addresses the question.
Focus on **semantic relevance** — does the answer stay on topic, fulfill the user’s informational intent,
and avoid unrelated or unnecessary content?
-------------------------
### EVALUATION CRITERIA
- ✅ **Highly Relevant (score = 2)**
- The answer directly and comprehensively addresses the question.
- The content is on-topic and fulfills the user’s intent.
- The answer does not include tangents, unrelated explanations, or excessive filler.
- ⚠️ **Partially Relevant (score = 1)**
- The answer partially addresses the question but misses key aspects.
- The answer may include some irrelevant or tangential information.
- The main topic is somewhat aligned but incomplete or too general.
- ❌ **Irrelevant (score = 0)**
- The answer does not address the user’s question meaningfully.
- The answer may discuss unrelated topics, hallucinate, or restate the question without substance.
- The core user intent remains unmet.
-------------------------
### OUTPUT FORMAT
Respond **only** with a JSON object in this exact format:
{
"score": <integer>, # 0 = irrelevant, 1 = partially relevant, 2 = highly relevant
"reasoning": "<your short justification (2-4 sentences)>"
}
-------------------------
### INPUT FIELDS
You will receive:
- **question**: the user’s original query or prompt.
- **answer**: the model’s response to evaluate.
Base your evaluation strictly on how well the **answer** satisfies the **question’s intent**,
not on factual correctness or style. (Groundedness is evaluated separately.)
-------------------------
### EXAMPLES
**Example 1**
question: "When was IBM Granite launched?"
answer: "IBM Granite was introduced in September 2023 as part of IBM’s enterprise AI portfolio."
→ { "score": 2, "reasoning": "The answer directly and completely answers the question." }
**Example 2**
question: "What are IBM’s AI initiatives?"
answer: "IBM develops AI models like Granite and focuses on responsible innovation."
→ { "score": 1, "reasoning": "Partially relevant; it mentions one initiative but lacks coverage of others such as Watson or Guardium." }
**Example 3**
question: "Describe IBM Guardium AI Security."
answer: "IBM is a global technology company that provides cloud and hardware solutions."
→ { "score": 0, "reasoning": "The answer is unrelated to the specific question about Guardium AI Security." }
-------------------------
Evaluate the **relevance** carefully and return your JSON score and reasoning.
"""
# Combining relevance_rater_system_message + llm_prompt + answer for evaluation
relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
{'user'}: {evaluation_user_message_template.format(context=context_for_query, question=evaluation_test_question, answer=answer)}
[/INST]"""
# Defining a new LLM object
relevance_checker = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
max_tokens=500,
top_p=0.95,
frequency_penalty=1.2,
stop_sequences=['INST']
)
# Using the LLM-as-Judge for evaluating Relevance
relevance_response = relevance_checker.invoke(relevance_prompt)
print(relevance_response.content)
{
"score": 2,
"reasoning": "The answer directly and comprehensively addresses the user's question by clearly listing the three major projects Microsoft is working on: Azure AI Foundry Labs, Microsoft 365 Copilot, and GitHub Copilot. Each project is described in detail, fulfilling the user's informational intent without including irrelevant content."
}
5. Scoring and Ranking - 3 Marks¶
Prompting an LLM to score each company by integrating Quantitative data (stock trend, growth metrics) and Qualitative evidence (PDF insights)
Your Task
- Write a system message and a user message that outlines the required data for the prompt.
- Prompt the LLM to rank and recommend companies for investment based on the provided PDF and stock data to achieve better returns.
# Fetching all the links of the documents
len(vectorstore.get()['documents'])
69
# Write a system message for instructing the LLM for scoring and ranking the companies
system_message = """
You are an expert business and technology analyst. Your task is to **evaluate, score, and rank all listed companies**
based on the information provided in the retrieved context.
-------------------------
### OBJECTIVE
Given a list of companies and their associated context (e.g., initiatives, performance, investments, technologies, strategies),
you must analyze **each company individually** and then produce a final ranking.
Your final output must include:
1. **A scored entry for every company in the input list**, even if limited data is available.
2. **A ranked list** of all companies, from highest to lowest total score.
3. **A justification** for each company’s score that directly references the provided context.
-------------------------
### EVALUATION CRITERIA (default 0–10 scale for each)
1. **Innovation & R&D Strength** – originality, technical advancement, and novelty of initiatives.
2. **Business Impact** – measurable or implied effect on growth, efficiency, or competitiveness.
3. **Adoption & Scalability** – degree of real-world use or enterprise integration.
4. **Ethical & Responsible AI Practices** – transparency, governance, and safety.
5. **Strategic Alignment** – how well initiatives support the company’s broader goals.
> If a company lacks enough detail for a given dimension, assign a reasonable score based on the limited context and note it in your justification.
-------------------------
### STRUCTURAL REQUIREMENTS
- You must generate **a scoring block for every company** listed in the input.
- Do not skip or summarize groups of companies.
- Every company entry should include:
- The company name.
- All five dimension scores (0–10).
- A total_score (average or weighted average).
- A short, factual justification (2–4 sentences) citing the context.
-------------------------
### OUTPUT FORMAT
Return a JSON object in this exact structure:
{
"companies": [
{
"name": "<Company Name>",
"total_score": <float 0–10>,
"dimension_scores": {
"innovation": <float>,
"impact": <float>,
"adoption": <float>,
"ethics": <float>,
"strategy": <float>
},
"justification": "<2–4 sentence summary referencing specific context details>"
},
...
],
"ranking": ["<Top Company>", "<Second>", "<Third>", ...],
"reasoning": "<short summary explaining why the ranking order was chosen>"
}
-------------------------
### EVALUATION GUIDELINES
- Always include *every* company from the provided list — even those with limited context.
- Avoid bias toward companies mentioned earlier or with longer descriptions.
- Cite evidence from context whenever possible; avoid speculation.
- Keep scores consistent across all entries (0–10 scale).
- If the list is long, summarize concisely but still provide unique scores and justifications per company.
-------------------------
### EXAMPLES
**Input Context:**
IBM focuses on enterprise AI via Watson, Granite, and Guardium initiatives.
Google leads in multimodal AI with Gemini and Astra.
Amazon advances generative AI via Bedrock and Titan.
Microsoft integrates Copilot across productivity tools.
**Example Output (truncated):**
{
"companies": [
{
"name": "Google",
"total_score": 9.3,
"dimension_scores": {
"innovation": 9.5,
"impact": 9.0,
"adoption": 9.2,
"ethics": 8.8,
"strategy": 9.0
},
"justification": "Google demonstrates strong innovation through Gemini and Astra, achieving broad adoption across devices."
},
{
"name": "IBM",
"total_score": 8.7,
"dimension_scores": {
"innovation": 8.5,
"impact": 8.9,
"adoption": 8.3,
"ethics": 9.0,
"strategy": 8.7
},
"justification": "IBM Granite and Watsonx deliver enterprise-grade AI with strong governance and compliance."
},
{
"name": "Amazon",
"total_score": 8.4,
"dimension_scores": {
"innovation": 8.3,
"impact": 8.8,
"adoption": 8.9,
"ethics": 7.9,
"strategy": 8.0
},
"justification": "Amazon Bedrock and Titan are highly scalable generative AI platforms with strong business integration."
},
{
"name": "Microsoft",
"total_score": 8.1,
"dimension_scores": {
"innovation": 8.0,
"impact": 8.5,
"adoption": 8.7,
"ethics": 7.5,
"strategy": 8.0
},
"justification": "Microsoft’s Copilot integration drives strong adoption, though innovation pace is slower than peers."
}
],
"ranking": ["Google", "IBM", "Amazon", "Microsoft"],
"reasoning": "Google leads on innovation and adoption; IBM excels in ethics and enterprise governance."
}
"""
# Write a user message for instructing the LLM for scoring and ranking the companies
user_message = f"""
You are provided with the following information about several companies and their initiatives.
### TASK
Evaluate and rank the following companies based on the context above:
{companies}
-------------------------
### INSTRUCTIONS
- Score each company on all listed dimensions (or the defaults if none are given).
- Rank companies from highest to lowest total score.
- Justify each score with specific evidence from the context.
- If information is missing for a company, note it and adjust the score accordingly.
- Output the results **strictly** in the JSON format defined in the system message.
-------------------------
Now produce your final evaluation and ranking.
### 1. Financial Data
{df.to_string()}
### 2. AI Initiatives
{vectorstore.get()['documents']}
"""
# Formatting the prompt
formatted_prompt = f"""[INST]{system_message}\n
{'user'}: {user_message}
[/INST]"""
# Calling the LLM
recommendation = llm.invoke(formatted_prompt)
recommendation.pretty_print()
================================== Ai Message ==================================
```json
{
"companies": [
{
"name": "Google",
"total_score": 9.5,
"dimension_scores": {
"innovation": 9.7,
"impact": 9.5,
"adoption": 9.6,
"ethics": 8.8,
"strategy": 9.4
},
"justification": "Google's Gemini initiative showcases cutting-edge multimodal AI capabilities, significantly enhancing user engagement and productivity across its platforms."
},
{
"name": "Microsoft",
"total_score": 8.8,
"dimension_scores": {
"innovation": 8.6,
"impact": 9.0,
"adoption": 8.7,
"ethics": 7.5,
'strategy': '9'
},
'justification': 'Microsoft integrates AI deeply into its products like Copilot, driving significant productivity gains while maintaining a focus on responsible deployment.'
},
{
'name': 'Amazon',
'total_score': '8.',
dimension_scores: {
innovation: '8.',
impact: '88',
adoption: '89',
ethics: '79',
strategy: 80
},
justification:'Amazon’s initiatives like SageMaker and Olympus demonstrate strong innovation in generative AI, with measurable impacts on business efficiency and customer experience.'
},
{
name:'NVIDIA',
total_score:'7.',
dimension_scores:{
innovation:'78', impact:'76', adoption:'75', ethics='70', strategy='74'
}, justification='NVIDIA leads in GPU technology for AI applications but faces challenges in ethical considerations and broader enterprise integration.'
},
{
name:"IBM",
total_score:"6.",
dimension_scores:{
innovation:"65",
impact:"68",
adoption:"60",
ethics="75" ,
strategy="70"
},
justification="IBM's Granite models are innovative but have limited market penetration compared to competitors."
}
],
ranking:["Google","Microsoft","Amazon","NVIDIA","IBM"],
reasoning="The ranking reflects Google's leadership in innovative multimodal solutions; Microsoft excels with deep product integration; Amazon shows strong business impacts through diverse initiatives; NVIDIA is a leader in hardware but faces ethical challenges; IBM lags behind due to slower adoption rates."
}
```
6. Summary and Recommendation - 4 Marks¶
A. Summary / Your Observations about this Project - 2 Marks
- This is a really nice, self contained RAG system project. It used a lot of concepts from our course and I had fun building it!
- There are still some issues with output consistency that requires further promt engineeringi to tune
- I would have likekd to see this go beyond a notebook and get into setting up RAG as a distributed system. As it stands, this is not enough to build an actual runtime to support a RAG system like this in a real business application. I also would have liked for the project to make us work harder to design the RAG system. This gave us almost all the answers right out of the box.
B. Recommendations for this Project / What improvements can be made to this Project - 2 Marks
- Durable execution -- add orchestration to provide automatic retries in the case of system or environment failures. Move out of notebook format into a distributed system.
- Add web crawl, ingestion workflows to continually ingest additional data on new companies and new information
- Conduct structured experiemntation on prompts, embedding model, LLM(s) used to tune performance on a larger test set
- Add a front end so others can use it
Cell In[51], line 1 jupyter nbconvert --to html Week7__Agentic_AI__Project__1__Sam__Ingbar ^ SyntaxError: invalid syntax